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Towards Paradigm Peace In Physics 
Education Research*^ 
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In his address "The Paradigm Wars and Their Aftermath," delivered over a decade ago at the annual 
AERA meeting in San Francisco, Gage (1989) foresaw three possible histories of education research as 
written in 2009: 

(1) Raging in the 1980's, the Paradigm Wars resulted in the demise of objectivity-seeking 
quantitative research on teaching - a victim of putatively devastating attacks from anti- 
naturalists, interpretivists, and critical theorists. 

(2) Thus from the jungle wars of the 1980’s, educational researchers, including those 
concerned with teaching, emerged into a sunlit plain - a happy and productive arena in 
which the strengths of all three paradigms (objective-quantitative, interpretive-qualitative, 
and critical-theoretical) were abundantly realized, with a corresponding decrease in the 
harmful effects of their respective inadequacies. 

(3) What happened after 1989 in research in teaching was pretty much the same as what 
happened before 1989. The invective and vituperation continued. 

With the benefit of another decade of experience, it may be worthwhile to attempt to predict which (if 
any) of Gage's three scenarios will occur. I think that harbingers of 2009 exist in current physics- 
education research (PER) (Redish, 1999; McDermott & Redish, 1999). This paper summarizes two 
examples and then gives a prediction. 
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Example L Quantitative Research: A Meta-Analysis 

Data from a survey (Hake, 1998a,b,d; 2000) of pre/post data using standard tests of conceptual 
understanding, recognized for high reliability and validity [Halloun-Hestenes Mechanics Diagnostic test 
(Halloun & Hestenes, 1985) or the more recent Force Concept Inventory (Hestenes, Wells, Swackhamer, 
1992; Halloun, Hake, Mosca, Hestenes, 1995)] for 62 introductory physics courses enrolling a total number 
of students N = 6542, are shown in Fig. 1. Starting in 1992, 1 requested that pre-/post-FCI data and post-test 
Mechanics Baseline (a problem-solving test due to Hestenes & Wells, 1992) data be sent to me. Since 
instructors are more likely to report higher-gain courses, the detector is biased in favor of those courses, but 
can still answer a crucial research question: Can the use of Interactive Engagement (IE) methods increase 
the effectiveness of introductory mechanics courses well beyond that obtained by traditional methods? 




Fig. 1. The %<Gain> vs %<Pretest> score for 62 courses enrolling a total of N = 6542 students. Slope 
lines «g»i 4 T for the average of the 14 T courses and «g» 48 iE for the average of the 48 IE courses are 
shown, as explained in the text. (From Hake, 1998a.) 
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A histogram of the data of Fig. 1 is shown in Fig. 2. 




Fig. 2. Histogram of the average normalized gain <g>: red bars show the fraction of 14 Traditional 
(T) courses (N = 2108) and green bars show the fraction of 48 Interactive Engagement (IE) courses 
(N = 4458), both within bins of width A<g> = 0.04, centered on the <g> values shown. (From Hake, 
1998a.) 

The data of Figs. 1 and 2 show that: 

a. A consistent analysis over diverse student populations in high schools, colleges, and universities is 

obtained if a rough measure of the average effectiveness of a course in promoting conceptual 
understanding is taken to be the average normalized gain <g>. The latter is defined as the ratio of the 
actual average gain (%<Gain>) = (%<posttest> - %<pretest>) to the maximum possible average gain 
(%<Gainn,jj^>) = (100 - %<pretest>). It should be noted that for any particular course C with course 
point [(%<Gain>') , (%<pretest>*)] on Fig. 1, the absolute value of the slope of a line connecting the C 
course point with the point [(%<Gain>) = 0, (%<pretest>) = 100] is just the normalized gain <g>' for 
course C. Thus other courses C lying on that same line all have the same normalized gain <g>' as 
course C and are thus of equal average effectiveness. The maximum possible normalized gain = 

1.0 is represented by the 45 degree negative slope line in Fig. 1. 

b. Traditional (T) courses (passive-student lectures, recipe labs, and algorithmic-problem exams) fail to 
convey much conceptual understanding to the average student, yielding an average of the average <g>’s 
for 14 courses (N = 2048) of «g»i 4 T = 0.23 ± 0.04sd, where sd stands for standard deviation. The 
slope line «g»i 4 T is shown in Fig. 1. 
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c. Interactive-engagement (IE) courses can be much more effective than T courses in enhancing 
conceptual understanding. Here ”IE courses" are defined operationally as those reported by instructors 
to make substantial use of "IE methods." The latter are operationally defined as methods designed at 
least in part to promote conceptual understanding through interactive engagement of students in heads- 
on (always) and hands-on (usually) activities that yield immediate feedback through discussion with 
peers and/or instructors, all as judged by their literature descriptions. The 48 IE courses of the survey 
(N = 4458), yield «g» 4 siE = 0.48 ± O.I4sd. This is over twice that of «g»i 4 T» and is almost two sd's 
of «g» 48 iE above that of the T courses, reminiscent of differences seen in comparing instruction 
delivered to students in large groups with one-on-one instruction. (Bloom, 1984) The slope line 
«g»48,E is shown in Fig. 1. 

d. Current IE methods need to be improved, since none of the IE courses achieves <g> greater than 
0.69. In fact, as can be seen in Fig. 1, seven of the IE courses (N = 717) achieved <g>’s ranging from 
0.21 to 0.26, characteristic of T courses. It would seem that (contrary to the suggestion of Greene, 1999) 
it takes more than just an emphasis on concepts to induce high <g>. Case histories of the seven low-<g> 
courses (Hake, 1998b) suggest that implementation failures occurred that might be mitigated by: 

(1) apprenticeship education of instructors new to IE methods, 

(2) emphasis on the nature of science and learning throughout the course, 

(3) careful attention to motivational factors and the provision of grade incentives for taking IE 

activities seriously, 

(4) recognition of and positive intervention for potential low-gain students, 

(5) administration of exams in which a substantial number of the questions probe the degree of 

conceptual understanding induced by the IE methods, 

(6) use of IE methods in all components of a course and tight integration of those components. 

e. The correlation of the average normalized gain <g> with (%<pretest>) for the 62 courses of Fig. 1 is 
a very low +0.02. This constitutes an experimental Justification for the use of <g> as a comparative 
measure of course effectiveness over diverse student populations with widely varying average pretest 
scores. In contrast, the average posttest score (%<posttest >) and the average actual gain (%<Gain>) are 
less suitable for comparing course effectiveness over diverse groups since the correlation of: 

(%<posttest>) with (%<pretest>) is + 0.55, and 
(%<Gain>) with (%<pretest>) is - 0.49, 

as is reasonable. Note that in the absence of instruction, a high positive correlation of (%<post >) with 
(%<pre>) would be expected. 
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f. Fig. 3 shows the average course scores on the problem-solving Mechanics Baseline test (Hestenes & 
Wells, 1992) [available for 30 (N = 3259) of the 62 courses of the survey] vs those on the conceptual 
FCI. There is a strong positive correlation r = + 0.91 of the MB and FCI scores. This correlation and 
the comparison of IE and T courses at the same institution (Hake, 1998a) imply that IE methods enhance 
problem-solving ability. 




Fig. 3. Average posttest scores on the problem-solving Mechanics Baseline (MB) test vs those on the 
conceptual FCI test for all courses of the survey for which data are available: thirty courses (high 
school, college, and university) which enroll a total N = 3259 students. The solid line is a least-squares 
fit to the data points. The dashed line is the diagonal representing equal scores on the MB and FCI 
tests. (From Hake, 1998a.) 

g. An analysis (Hake, 1999a) of survey results in terms of the "effect size" <d> so commonly used in 
meta-analyses (Cohen, 1988; Hunt, 1997) has been carried out. Here <d> is defined as the ratio of the 
actual average gain (%<post> - %<pre>) to the average of the sd's. I obtain an average <d> = 0.88 for 9 
T courses (N = 1620), and an average <d> = 2.18 for 24 IE courses (N = 1843) for which sd's are 
available. The latter can be compared with: (1) the similar <d> = 1.91 reported by Zeilik, Schau, 

Mattem (1998) for a single IE introductory astronomy course (N = 221) given in Spring 1995 at the 
University of New Mexico, and (2) the much smaller average <d> = 0.51 obtained in a meta-analysis of 
small-group learning by Springer, Stanne, Donovan (1999). In the Springer study, as for much research 
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reported in the educational literature, (a) in many cases there was no pretesting to disclose initial 
knowledge states of the test or control groups, (b) the quality of the "achievement tests" was not 
critically examined (were they of the plug-in-regurgitation type so common in introductory physics 
courses?). I think that the Springer et al. meta-analysis probably understates the effectiveness of 
cooperative learning in advancing conceptual understanding and problem-solving ability. 

h. A detailed analysis of random and systematic errors has been carried out (Hake, 1996, 1998a) 

Random Error. Because the difference in «g»iE 48 ~ «g»Ti 4 = 0-25 is equal to 1.8 SD's of the 
<g>,E distribution and 6.2 SD's of the <g>T distribution, the probability that the difference could be due 
to chance is extremely small. The conventional statistical analysis (Slavin, 1992; Snedecor & Cochran, 
1989, p. 83-102) yields: t = 11; df = 60, where df is the "degrees of freedom"; and p < 0.001, where p is 
the two-tailed probability, p, that the means difference is due to chance. I assume that the 
t-test is approximately valid, even though the <g> distributions (Fig. 2) are skewed. [Because the 
variances of the <g> distributions for the IE and T courses are markedly dissimilar (F = 12), I have 
employed an approximation to the standard t-test due to Satterthwaite as discussed by Snedecore & 
Cochran, 1998, p. 97.] 

As discussed by Hake (1998d), physicist Robert Ehrlich thought that "....the size of the sample Hake 
used for the traditional courses was fairly small, so a statistical fluctuation was always a possibility" 
(private communication). Testing his conjecture, Ehrlich elicited pre/post FCl testing in 12 more-or-less 
traditional courses taught by instructors with whom he was acquainted. These yielded an average 
normalized gain «g» = 0.20 ± 0.06sd, consistent with my survey results for 14 T courses. 

Considering all the above, it is extremely unlikely that random error plays a significant role in the nearly 
two-standard- deviation difference of the T and IE courses. 

Systematic Error. The following systematic errors have been examined and arguments (not repeated 
here) have been advanced (Hake, 1998a) for their probable insignificance: (a) Question ambiguities and 
isolated false positives, (b) Teaching to the test and test-question leakage, (c) Varying fraction of course 
time spent on mechanics among the courses of the survey, (d) Varying post and pretest motivation of 
students among the courses of the survey, (e) Hawthorne/John Henry effects. Thus it is extremely 
unlikely that systematic error plays a significant role in the nearly two-standard-deviation difference of 
the T and IE courses. 
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Considering points "a" - "h" above, the results of the survey (Hake, 1998a) can be summarized as 
follows: 

a. Fourteen T courses (N = 2084) which made little or no use of IE methods achieved an average 
normalized gain «g» = 0.23 ± 0.04sd. In sharp contrast, 48 IE courses (N = 4458) that made 
substantial use of IE methods achieved an average gain «g» = 0.48 ± 0.14sd. It is extremely 
unlikely that random or systematic error plays a significant role in the nearly two-standard 
deviation difference of the T and IE courses. 

b. A plot of average course scores on the problem-solving Mechanics Baseline (MB) test vs those 
on the conceptual Force Concept Inventory (FCI) show a strong positive correlation r = + 0.91. 

This correlation and direct comparison of IE and T courses at the same institution imply that IE 
methods enhance problem-solving ability. 

c. The above conceptual and problem solving test results strongly suggest that the use of IE 
strategies can increase mechanics-course effectiveness well beyond that obtained with 
traditional methods. However, IE strategies and their implementation need to be improved. 

Results consistent with those of the survey have now been obtained by physics-education research 
groups at the Univ. of Maryland (Redish, Saul, Steinberg, 1997; Saul, 1998; Redish & Steinberg, 1999; 
Redish, 1999); Univ. of Montana (Francis, Adams, Noonan, 1998); Rennselaer and Tufts (Cummings, 
Marx, Thornton, Kuhl, 1999); North Carolina State Univ. (Beichner, Bernold, Bumiston, Dail, Felder, 
Gastineau, Gjertsen, Risley, 1999); and Hogskolan Dalarna - Sweden (Bernhard, 1999). Thus in PER, 
just as in hard-core traditional physics research, it is possible to perform quantitative experiments which 
can be reproduced (or refuted) by other investigators and thus contribute to the construction of a 
"community map." (Redish, 1999). 



Example 2, Quantitative Research: ^^Pinning a Kid To Her Seaf^ 

The number of IE courses using each of the more popular methods is - Collaborative Peer Instruction: 

48 (all courses); Microcomputer-Based Labs: 35; Concept Tests: 20; Modeling: 19; Active Learning 
Problem Sets or Overview Case Studies: 17; and Socratic Dialogue Inducing (SDI) Labs: 9. (Of course, 
relative popularity has no necessary connection with relative merit - in fact, considering that SDI labs 
came in last in popularity, there may well be a negative correlation between merit and popularity!) For 
references to these and other IE methods see Hake, 1998a, 1998b. 

As discussed by Hake (1991) Socratic Dialogue Inducing (SDI) labs (Hake, 1987; Tobias & Hake, 1988; 
Hake, 1992) were originally inspired by the work of Arons (1990) whose empirically-based ideas are 
consistent with much of the recent work in cognitive science (Bransford, Brown, Cocking, 1999). 
Although the Socratic method as portrayed in the Meno (Plato, 380 B.C.; Jowett, 1953) has been 
dismissed by some influential physics teachers (Swartz, 1994, 2000; Morse, 1994), it should be 
emphasized that the Socratic method used so successfully by myself and other physics instructors (see, 
e.g., Arons, 1990; McDermott, Shaffer, and Somers, 1994; Wells & Hestenes, 1995; Redish, Saul, and 
Steinberg, 1997) is derived not from the fictional Socrates of Plato's imagination as depicted in the 
Meno, but rather from the real historical Socrates as researched by the late great Gregory Vlastos 
(Vlastos, 1990, 1991, 1994). Vlastos (1990) wrote to me: 

/ only wish I had been taught physics in the way you propose .... Though Socrates was not 

engaged in physical inquiry, your (Socratic Dialogue Inducing Lab) program is entirely in 

his spirit. 

According to Howard Gardner (1989): 

If Confucius can serve as the Patron Saint of Chinese education, let me propose Socrates as his 
equivalent in a Western educational context - a Socrates who is never content with the initial 
superficial response, but is always probing for finer distinctions, clearer examples, a more 
profound form of knowing. Our concept of knowledge has changed since classical times, but 
Socrates has provided us with a timeless educational goal - ever deeper understanding. 

Rob Reich (1998) wrote: 

The Socrates I admire can best be described as curious and confidently humble. The figure of 
Socrates symbolizes most powerfully the ideal of constant openness and eagerness to enter into 
dialogue with others. At the same time, the expectations from such dialogue are not that a final 

truth will be established As such, the Socratic image is attractive for the purposes of 

modeling common inquiry and forwarding, in a sense, civic participation and engagement. 

Similarly, I admire the legacy of the Socratic method for its pedagogical potency to develop 
analytical skills and a philosophical habit that should prove invaluable in a modern democracy. 
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In SDI Labs students collaborate in groups of four to: 



1. Perform (often predict and then perform) simple hands-on experiments involving a BODY at 
rest or in motion. 

2. Draw “snapshot sketches" at sequential clock-readings showing (a) color-coded vectors to 
indicate ALL the forces acting ON the BODY - labeled as F^nAbyBi where A is the BODY and B 
is some other interacting body; (b) color-coded velocity and acceleration vectors "if they exist." 

3. As the experiments proceed, discuss with other students and then write down answers to 
questions that probe for reasoning skills and basic conceptual understanding of Newton’s laws. 
The question format is such as to require rather complete explanations, justifications, and/or 
sketches and not simply yes-or-no answers. 

4. If stumped or confused on any of the above (after serious effort and discussions with other 
students) engage in Socratic dialogue with an instructor after signaling for help. 



For example, in an SDI Lab on Newton's Second Law (Hake, 1998c) students are presented with this 
(thought) experiment: 





A. Recalling your work in SDI #1, can a truck driver pin a kid to her seats by driving his truck at a 
very high constant velocity v as suggested in the cartoon above? { Y, N, U, NOT} 

[SDI Lab Rule #11: "The lab manual questions are designed to help you THINK about the experiments 
and how they relate to Newton's laws. You will often be asked to predict the outcome of an experiment 

and then perform that experiment. A curly bracket { } indicates that you should ENCIRCLE O a 

response within the bracket and then, we INSIST, briefly EXPLAIN or JUSTIFY your answers in the 
provided on these sheets. The letters { Y, N, U, NOT} stand for {Yes, No, Uncertain, None Of 
These}."] 
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B. If you were a truck driver and wished to pin a kid to her seat (by control of the truck motion) what 
might you do? (Three time-sequential snap-shot sketches with force F, velocity v, and acceleration a 
vectors and clocks are worth 10 terawords.) [HINT: It may help to consider the horizontal motion of a 
disk in your hand.] 

Videotape PSTl-4a shows Pat, Mary, June, and Doug tackling the above questions. After discussing 
question "A” regarding an earlier disk-carry experiment, they correctly decide that a truck driver could not 
pin a kid to her seat by driving his truck at a very high constant velocity v. They encircle "N” and justify 
their choices in writing in their lab manuals. Moving on to question ”B", the students finally conclude that 
the driver could pin a kid to her seat by giving the bus a constant forward acceleration. They correctly 
draw snapshot sketches showing v^id increasing with time; a^id = constant ^ 0; equal and opposite vertical 
forces F<,„ sea. and F„„ ^id by Earth ; and a con slant /orward horizontal force F„„ by sea.- Mary also shows a 
backward horizontal "pinning force" F^nkid^ thinking it's the "force due to acceleration." 

Socrates'. Now you have here F on kid by seat. That's great, but I'm really worried about that force 
(points to force on the kid in the backwards direction) .... is that really there? 

Mary: Well, it's gotta be.. .that's what's pin'n it back. 

Doug: Not in the first part where there's no 

Mary: (excitedly) NO!! IT’S NOT!!! IT'S NOT THERE!! 

But Doug maintains otherwise: "it's there it's F^^ seat by kid"- explains correctly that "it exists by 

Newton's Third Law but we're not concerned with... (the seat)." Mary concludes "that's tricky!" Socrates 
agrees. 

As for all SDI lab experiments, the selection and phrasing of Socratic questions such as "A" and "B," 
and the activities surrounding them have been developed by (a) qualitative research (e.g., videotape 
analysis, student interviews, instructor discussions) extending over many years, and (b) long-term 
classroom use, feedback, assessment, research analysis, and redesign. (Wilson & Daviss, 1994) 

The analysis of such videotapes has indicated the ways students tend to think about mechanics topics 
such as "pinning a kid to her seat" and prompted gradual improvements in SDI labs so to more 
effectively guide students to construct their understandings in accord with the Newtonian view - 
"scientific constructivism" (Redish, 1999). 
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Complementarity of Qualitative and Quantitative Research 

Over a ten-year period at Indiana University, SDI labs were integrated into courses in which lectures, 
discussions, and exams emphasized conceptual understanding and interactive engagement. Lectures 
usually employed a standard textbook and back-of-chapter problem assignments, and discussions were 
devoted to cooperative group problem solving with Socratic guidance. The courses enrolled a total of 
1263 students (primarily pre-med and pre-health professionals) and achieved an average <g> on the 
conceptual Halloun-Hestenes tests of 0.60 (Hake, 1998b), considerably higher than the average 
<g> = 0.47 of the other interactive-engagement courses considered in the survey (Hake, 1998a,b). 

Thus the qualitative research which has improved SDI labs has contributed to their effectiveness as 
shown by the quantitative survey results. Then too: 

a. The Halloun-Hestenes tests used for the survey were themselves developed by painstaking 
qualitative research involving the analysis of students' verbal responses to open-ended, conceptually- 
oriented questions. 

b. The Maryland, Montana, Rennselaer/Tufts, and North Carolina State groups referred to above are 
examples of physics-education-research teams noted for their synergistic use of both quantitative and 
qualitative research methods. 

c. A critical review and listing of physics-education research articles over the past four decades 
(McDermott & Redish, 1999) shows a mix of mutually supportive quantitative and qualitative work. 
(Largely ignored by the physics and education communities.) 

However “paradigm peace” has not yet been achieved in physics-education research. For example: 

(a) Skepticism regarding the utility of quantitative research for the design of instructional materials 
is sometimes expressed (Pride, Vokos, and McDermott, 1998). 

(b) For many research physicists the term ''qualitative research” is an oxymoron. 

(c) Some “radical constructivists” are uncomfortable with quantitative tests such as the FCI that fail 
to reward students for the construction of understandings that do not coincide with those of 
professional physicists. 

(d) It has been argued (van Aalst, 2000) that from the standpoint of the history and sociology of 
science, PER "produces knowledge that is qualitatively (especially ontologically) different from 

knowledge produced by physics research, and to represent it as a subfield of physics [as does 

Redish, 1999 (and the present author)], is a distortion." 

(e) It can be inferred from the thesis of the McGinn & Roth, 1999 (who would seem to qualify as 
"antinatural critical theorists" in Gage’s taxonomy) that PER as described above is irrelevant for 
preparing students for "competent scientific practice" because it is disconnected from "authentic 
science" as revealed by "empirical research in science and technology studies." 
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Prediction for the Year 2009 

On the basis of (a) the present evidence for the effectiveness and complementarity of quantitative and 
qualitative PER, (b) the present state of PER (McDermott & Redish, 1999), and (c) increasing 
interdisciplinary synergy [fostered by e.g., the web (Hake, 1999b; Rochelle & Pea, 1999; Pea, 2000) and 
organizations such as AERA (< http://www.aera.net >, witness this meeting) and AAHE 
< http://www.aahe.org/ >], I shall predict that for PER, and possibly even education research generally, 
there will be ”a productive rapprochement of the paradigms" by the year 2009, in accord with Gage’s 
scenario #2. Some will follow paths of pragmatism or "Popper’s piecemeal social engineering" to this 
paradigm peace, as suggested by Gage. However most will enter onto this "sunlit plain" from the path 
marked "scientific method" as practiced by most research physicists: 

(1) "EMPIRICAL: Systematic investigation (by quantitative, qualitative, or any other 

means) of nature to find reproducible patterns in the structure of things and the ways they 

change (processes). 

(2) THEORETICAL: Construction and analysis of models representing patterns of nature." 
(Hestenes, 1999). 

(3) "Continual interaction, exchange, evaluation, and criticism so as to build a community 

map" (Redish, 1999). The latter crucial feature of the scientific method has also been 
emphasized by Ziman (1978), Cromer (1997), Gere (1997), Gottfried & Wilson (1997), and 
Newton (1997). 
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